Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion
نویسندگان
چکیده
Recently, neural sequence-to-sequence (Seq2Seq) models have been applied to the problem of grapheme-to-phoneme (G2P) conversion. These models offer a straightforward way of modeling the conversion by jointly learning the alignment and translation of input to output tokens in an end-to-end fashion. However, until now this approach did not show improved error rates on its own compared to traditional joint-sequence based n-gram models for G2P. In this paper, we investigate how multitask learning can improve the performance of Seq2Seq G2P models. A single Seq2Seq model is trained on multiple phoneme lexicon datasets containing multiple languages and phonetic alphabets. Although multi-language learning does not show improved error rates, combining standard datasets and crawled data with different phonetic alphabets of the same language shows promising error reductions on English and German Seq2Seq G2P conversion. Finally, combining Seq2seq G2P models with standard n-grams based models yields significant improvements over using either model alone.
منابع مشابه
Joint-sequence models for grapheme-to-phoneme conversion
Grapheme-to-phoneme conversion is the task of finding the pronunciation of a word given its written form. It has important applications in text-to-speech and speech recognition. Joint-sequence models are a simple and theoretically stringent probabilistic framework that is applicable to this problem. This article provides a selfcontained and detailed description of this method. We present a nove...
متن کاملMassively Multilingual Neural Grapheme-to-Phoneme Conversion
Grapheme-to-phoneme conversion (g2p) is necessary for text-to-speech and automatic speech recognition systems. Most g2p systems are monolingual: they require language-specific data or handcrafting of rules. Such systems are difficult to extend to low resource languages, for which data and handcrafted rules are not available. As an alternative, we present a neural sequence-to-sequence approach t...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملA latent analogy framework for grapheme-to-phoneme conversion
Data-driven grapheme-to-phoneme conversion involves either (top-down) inductive learning or (bottom-up) pronunciation by analogy. As both approaches rely on local context information, they typically require some external linguistic knowledge, e.g., individual grapheme/phoneme correspondences. To avoid such supervision, this paper proposes an alternative solution, dubbed pronunciation by latent ...
متن کاملRead, Attend and Pronounce: An Attention-Based Approach for Grapheme-To-Phoneme Conversion
We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for training. In contrast, the attentionenabled encoder-decoder model allows for jointly learning to align and convert characters to phonemes. With this approach, we achieve state-of-the-art...
متن کامل